knitr::opts_chunk$set(echo = TRUE, results=TRUE, echo=TRUE, message = FALSE,
warning=FALSE, fig.align="center", cache=FALSE)
options(scipen=99)
This project was produced as part of the University of Pennsylvania’s Master of Urban Spatial Analytics Spring 2020 Practicum (MUSA 801) taught by Ken Steif, Michael Fichman, and Matt Harris. We would like to thank X of the City of Louisville for providing feedback and data.
The following document presents an analysis of affordable housing in Louisville, KY and EquiLiving, an interactive tool for the Louisville Affordable Housing Trust Fund (LAHTF) that will ultimately help proactively prioritize the allocation of affordable housing in neighborhoods with the greatest opportuinity before it becomes more expensive to develop in that part of the City. EquiLiving provides a risk score for market housing areas in the city, indicating each neighborhoods opportuity for affordable housing.
This document begins with a case study predicting home sales prices in Louisville, KY, and is followed by a series of appendices that discuss data wrangling, data visualization, data sources, feature engineering, and model results. Navigate through the document either by using the panel at the left, or by clicking the hyperlinks throughout the document.
Home prices have increased over the last decade thoughout Louisville making it more difficult for low-income families to live in neighborhoods of opportunity - areas with good school systems, low crime rates and porverty. The Louisville Affordable Housing Trust Fund has been working on solving this problem by develping affordable housing in areas that allow low-income families to live in these types of neighborhoods. Since there are limited funds available to develop affordable housing, we believe that it extremely important to prioritize where affordable housing is built. Therefore, by analyzing Louisville’s current housing market to predict future home prices, we designed a predictive planning tool that uses future trends to help determine where to build affordable housing.
Louisville, like many American Cities, have experienced high levels of development pressure over the years as demand for living in a city has increased. Single family home prices have increase x% since 2008. New construction and rehab permits have increased y% and % respectively since 2008. This new development demand may be making Louisville less affordable to all, and in particular low-income communities and communities of color where there is a shortage of ~30,000 affordable and available housing units in Louisville for extremly low-income households (at or below 30% AMI) INSERT CITATION. The goal for Louisville is to ensure that despite recent neighborhod change, Louisville remains a place where all can thrive regardless of race or class.
Among a host of equity-related interventions currently underway in Louisville, the City has created a Louisville Affordable Housing Trust Fund (AHTF) to provide bridge financing for the development of affordable housing. In this project we explore the neighborhood change process in Louisville using parcel and neighborhood level administrative data. We then use this information to develop a price-index that can forecast home sale prices across space and time. Such forecasts may enable planners to better allocate limited affordable housing resources in the communities that will experience major changes in the socioeconomic makeup of neighborhoods in the future.
According to Opportunity Insights, a Harvard University research team, many low-incomes households are seperated into areas with lower economic opportunity (ADD CITATION) decreasing their chance of upward mobility. “Every year a child spends growing up in an area with better outcomes causes the child to have better outcomes in adulthood”. Therefore, having a low-income child grow up in a Neighborhood of Opportunity, some include a higher-income neighborhood with lower crime rates and better school systems, can provides the child with a better chance of earning a higher income than his or her parents. This economic challenge can be improved by developing affordable housing in Louisville’s neighborhoods of opportunity to help low-income families become socially mobile.
The Housing Trust Fund, established in 2008, invests in affordable housing for resident who do not have enough income to afford market rate housing. Since 2014, the AHTF has allocatd over $20 Million dollors to fund almost 1600 units throughout Lousville to improve the affordable housing shortage. Currently, they primarily focus on giving grants and loans for affordable housing projects focused on new construction, rehabilitation, rental assistance, and much more where prioritity is given to projects intergrating low income housing with market rate housing. (Source: http://loutrustfund.org/wp-content/uploads/2019/10/2020-Funding-Guidelines-Final.pdf)
There are six primary steps in the AHTF allocation process:
1. Housing Trust Fund Assessment: an annual assessment is compeleted to determine the demand for affordable housing in Louisville. This report outlines the City’s current housing market conditions to identify the types of housing projects the City needs.
2. Funding: Based on the reccomendations from the Housing Needs Assessment Report, the City of Louisville provdies an annual budget to the LAHTF.
3. Budget & Allocation: Based on the funding received, an annual budget is created with specific requirements of how the funds will be allocated. For example, the Louisville Metro Governemnt allocated $5 million in funds where $4.75 Million is used for development activites, $250,000 for program administration, and $25,000 ofor supportive housing services.
4. Request for Proposals (RFP): Each year the LAHTF issues an RFP for project that meet that the City’s housing needs. These projects are primarily focused on bridge financing.
5. Review Process: The LAHTF Program Committee, consisiting of LAHTF Board Members, Kentucky AHTF Experts, Louisville Metro Housing Authority, and Louisville Metro Government. A variety of factors, includng historical projects and current affordababilty trends, are considered, but future change in homes prices are not.
6. Investment: The funds are provided to fund developement of affordable housing projects.
Improvement to Allocation Process:
In order to improve the process, EquiLiving, a planning tool will be developed to provide home price prediction to help Trust Fund identify future changes in neighborhood home prices to better allocate funds to invest in affordable housing.
To predict neighborhoods of opportunity in Louisville, we will predict the change in home sale prices throughout Louisville using the repeat sales method, a technique that involves calculating changes in the sales price of the same piece of real estate within given timeframes. This relatively simple approach can be used to estimate shifts in home prices over periods extending from months to years. A major difficulty in generating home price indices is the heterogeneity of different units of real estate in terms of their location or features. Moreover, composition of the housing stock and the homes sold changes over time (months or years) which adds to the difficulty. Hedonic variables such as the built area of the house, the number of bedrooms or other house characteristics could be used to control for these. However, models with hedonic variables could face issues when there are problems concerning data availability or data accuracy. The repeat sales model addresses these problems by using a fixed control for each house in the dataset which can otherwise control for the hedonic variables. Perhaps the most well-known housing index that uses the repeat-sales method is the Case-Shiller National Home Price Index. The Case-Shiller Index measures changes in the value of the U.S. residential housing market by tracking the sale price and resale price of single-family homes that, as the name suggests, have transacted more than one time.
For this analyisis, we use housing market areas (HMAs) as our geographic unit of analysis to predict home sale values. Louisville’s 21 housing market areas, defined by Louisville Metro Government based on residential neighborhoods, employment centers, and landmarks, are comprised of the 2010 census tracts, as shown in the map below. (Source: 2019 Housing Needs Assessment Report) Since the socioeconomic data is only available at the census tract level, part of our exploratory data analysis will be at census tracts levels. However, we try to spatailly show and analyze trends at the HMA level when possible since our model predictions are for each HMA. As shown in the map below, the HMAs located inside interstate 264 are considered the urban core, an urban setting including 8 HMAs (Northwest Core, West Core, Southwest Core, Downtown, University, Norteast Core, East Core, Southeast Core), while the non-urban core, a more suburban environment, consist of 13 HMAs. Throughout this study, we will compare trends inside and outside the urban core.
The table below includes the data for our analysis: environmental data from Louisville’s Open Data website, fire data from the Louisville Fire Department, and property valuation data from the City of Louisville. For further details on these sources, please see Appendix: Data Dictionary.
| Dataset | Source | Years |
|---|---|---|
| Historic Home Sales | Louisville Property Value Assessment | 1948-2019 |
| Historic Permits | Open Data Louisville | 2003-2019 |
| Property Foreclosures | Open Data Louisville | 2011-2019 |
| Evictions | Eviction Lab | 2000-2016 |
| American Community Survey, 5-Year Estimates | US Census Bureau | 2000-2018 |
| Land Use | Open Data Louisville | NA |
| LAHTF Projects | Lousiville Affordable Housing Trust Fund | 2014-2019 |
The data was also “wrangled” before being explored in the following section. This process included various transformations of the data in order to optimize predictive ability of each variable. For details on this procedure, please see Appendix.
The goal of exploratory analysis is to first explore where the LAHTF has invested in projects and backwards engineer the allocation process of HTF investments. Second, to explore the process of neighborhood change in Louisville focusing on development and socioeconomic trends across space.
Through our exploratory analysis, we answer the following questions:
+ Where has Lousville’s Affordable Housing Trust Fund allocated fund and how has this changed over time?
+ What is the relationship between key variables and neighborhoods receiving LAHTF projects?
+ What are the time and space trends of home prices across Louisville?
+ What are the time and space trends of permits across Louisville?
+ What are the relationships between socioeconomic characteristics and home prices across Louisville?
Between 2014 and 2019, the LAHTF funded $20.3 million for 184 projects resulting in ~1600 units. Since 2016, the total amount of dollars invested in AHTF projects has increased each year resulting in more affordable units being develoed. The chart below an annual summary of the LAHTF’s investments.
| Year | Projects | Units | Dollars |
|---|---|---|---|
| 2014 | 12 | 15 | 282 |
| 2015 | 9 | 9 | 152 |
| 2016 | 7 | 7 | 197 |
| 2017 | 37 | 331 | 2390 |
| 2018 | 58 | 734 | 6347 |
| 2019 | 61 | 477 | 9976 |
The maps below show the spatial distribution of AHTF projects, units developed, and dollars invested in Louisville’s housing market areas. Even though a majority of the AHTF projects occured inside the urban core, along the northwest portion, most of these projects results in single family homes while a greater amount of dollars were invested outside the urban core in multi-family units. There are a couple neighborhooods in the urban core that have multi-family units, but a majority of the multi-family units are outside.
Lousiville’s Affordable Housing Trust Fund allocation process prioritizes projects in neighborhoods based on the following criteria:
* Declining homeownership
* Blocks with multiple vacant or abandoned buildings or lots to develop complete blocks
* Single-family homes in the urban core (typically the urban area with lower income levels)
* Multi-family homes outside the urban core (typically the suburbs with higher income levels)
Comparing neighborhoods with and without AHTF projects provides a strong basis for understanding the relationships between neighborhoods with AHTF projects and exploratory variables. This allows up to see which variables see higher AHTF projects.
The 2010 mean values of different variables are plotted in groups (Affordable Housings vs. Not Affordable Housing) to determine which types of neighborhoods received LAHTF investments from 2014-2019. These plots provide supporting evidence to confirm the allocation criteria that the LAHTF considers for prioritizing investments. While comparing the data to the LAHTF criteria, there is supporting evidence that neighborhoods that received AHTF investment had lower homeowners rates in 2010 compared to neighborhoods that did not receive AHTF investments. Similarly, neighborhoods receiving AHTF investments had lower median home price, median income, higher inequality (Gini coefficient), and poverty rates compared to neighborhoods that did not receive AHTF investments.
As shown in the charts below, AHTF were allocated to projects from 2014-19 in neighborhoods that have a lower percent change in homeownership, higher percentage of vacant land, lower median household income, higher income inequality, higher percent black.
Throughout the City, the higher sales prices are clusters in the eastern portion of the urban core (located inside the thick black outline) and the eastern periphery of the suburban neighorhoods, located outside the urban core, as shown in the map belwo on the left. However, as illustrated in the map on the right, the neighbrohoods with the lower average sale pirces, located in the central portion of the urban core experienced the greater increase in price since decade (2010-2018). The prices in some of these neighborhoods have almost doubled.
Since 2010, [INSER NEW VALUE] new construction permits have been issued by the City of Louisville’s License and Inspection Department. The number of overall residential building permits issued has remained consistant over the years where a majority of them are for single-family homes. Since 2014, the total number of single family permits has slightly increased while there has been a faint decline for multiple family permits residential building permits has been for signle family in Louisville has been there has been a minor increase in renovation permits a slight decline in building permits. However, as of 2018, the City has issued almost 17 times (~1,700) as many single permits compared to multi-family permits (~100).
Over the last decade, there has started to be a shift in residential permits for single family homes - the total number of new permits issues has almost doubled from ~600 to ~1,100 while rehab permits declined by almost 50%. This indicates that there has been an increase in developing more single family homes in the City since recovering form the recession while the rates for multi-family permits have remained relatively consistant, as shown on the bottom right.
Home sales prices vary across space based on the socioeconomic makeup of a neighborhood. Typically, the higher income neighborhoods have higher home prices. In Louisville, the same story holds true where there is a strong positive relationship between sale prices and median household income and median rent, as shown below.
Based on past AHTF investments, a majority of these projects are in areas with the lowest average home prices in Louisville that have high poverty rates, low percentage of whites and bachelors degrees, and low rent and household incomes. This is illustrated by the green dots in the charts below that represent census tracts with at least one AHTF project.
The goal of our model is to develop a repeat sales method using monthly periods to predict future home sale prices for housing market areas throughout Louisville. (See modeling stategy section (INSERT LINK tTOo INTRO) for more details) This method is similar to the Case-Shiller National Home Price index, by tracking the sale price and resale price of single-family homes that have transacted more than one time. In this section, we discuss the data features used in the model, development of the price index for Louisville, validation of the model resuts, and home sale price predictions for 2019-2023.
First, we created date features such as year, quarter, months_since_00 using lubridate package in R. months_since_00 is the number of months that have elapsed since January 1, 2000. For our project, the data used for building the model ranged from January 2000 to December 2018. The data for 2019 was incomplete and thus excluded. Additionally, sale prices less than 5000 USD and greater than 2500000 USD were considered as outliers and thus were removed.
The log of sale prices was measured as function of year quarters from 2000 - 2018 and house fixed effects. The below plot shows the coefficients of the regression plotted against time (in year quarters). The coefficients are averages and each of them is interpreted relative to the baseline quarter (January - March 2000). The plot of these coefficients is typically how the repeat-sales index would look like.
index.quarter.df %>%
ggplot(aes(Quarter, Price, group=1)) +
geom_ribbon(aes(ymin = Confidence_2.5, ymax = Confidence_97.5), fill = "grey70") +
geom_point(aes(colour=Significant)) +
scale_color_manual(values=c("#266E75","#266E75"))+
geom_line() +
labs(title="Jefferson County Price Index: 2000 - 2018", subtitle="95% Confidence Intervals included",
y="Price Index") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
theme(legend.position = "none")
Next, we calculated the sale price indices for 2000-2018. The repeat sales index is typically calculated by using a linear regression. Sales price data from January 2000 was used as a baseline for calculating price indices. The regression hypothesizes that sale price in time is a function of neighborhood fixed effect, house fixed effect and controls for time, like month, quarter or year. The time fixed effect controls for factors that vary over the years (2000-2018), and consequently accounts for inflation in the dollar value. The neighborhood fixed effect controls for factors that vary across different housing market areas, such as location of amenities or dis-amenities. The house fixed effect accounts for factors that change from one real estate unit to another, such as house characteristics or whether a house may have been sold more than once within this study period. For this regression, we first split the data into a train set and test set. The train set was obtained by randomly sampling the dataset and selecting 75% of the total data points. The remaining 25% of the dataset was the test set which was used for validation.
Mean Absolute Percent Error (MAPE) indicates of the accuracy of the forecast. It is the measure of how much percentage of the predicted values deviate from the observed values. We calculated the absolte percent error for the test set. The mean absolute percent error was calculated for each year from 2000 to 2018 and is shown in the plot below. Its values range between 0.4 and 0.7 for the study period. The least MAPE is observed around the 2008 recession.
#Calculate MAPE by year.
error %>%
group_by(year) %>%
summarize(MAPE = mean(absPercentError, na.rm=T)) %>%
ggplot(aes(year, MAPE)) +
geom_bar(stat="identity", fill ="#3AA083" ) +
ggtitle("MAPE by year")
The MAPE was also visualized by the housing market areas and is shown in the below plot. The areas with the highest MAPE are colored light blue and the areas color darkest blue have the least MAPE. The most housing market areas have similar shades of blue except the West Core, Northwest Core, Downtown and the three outer suburban market areas in eastern part of Louisville, which have a lighter blue. This indicates that model works well for the darker blue housing market areas and generalizes well to them, since it gives similar values of MAPE.
#MAPE by HMA
error %>%
group_by(H_Mkt_Area) %>%
summarize(MAPE = mean(absPercentError, na.rm=T)) %>%
left_join(hma, by=c("H_Mkt_Area" = "H_Mkt_A")) %>%
st_sf() %>%
ggplot() + geom_sf(aes(fill=MAPE)) +
scale_fill_gradientn(colours = palette5, na.value = "grey50",guide = "colourbar",aesthetics = "fill") +
ggtitle("MAPE by Housing Market Area")+
mapTheme()
ggplot(error, aes(Price, Error)) +
geom_point(color="#2D3F50") + geom_smooth(method="lm", color = "#83CE7B") +
ggtitle("Price as a function of Error")
The error (difference between predicted and average sale prices) was plotted for each sale price data point. The above plot depicts that and indicates that homes with higher sale prices show the most error. The model works reasonably well for homes which have lower market sales price. Therefore this model can be considered effective for the purposes of this project since the aim is to identify lower income neighborhoods which may become more expensive in the future.
Having a model trained on the space/time real estate trends from 2000 to 2018, it used in a predictive context, to predict sale prices between 2019 and 2023. To derive a sample of houses on to which sale prices can be forecast, we randomly sampled 100,000 homes since 2013 (assuming that ~20,000 homes were sold each year and 100,000 homes would therefore be sold in 5 years). Sale price values since 2013 were selected to represent modern parcels/ homes. A random sale month from 2019 -2023 was assigned to each of the 100,000 sampled homes prices. This random sample was used to predict the price indices for the future and to reduce bias in selection. These predicted prices were used to identify the sale price change from 2018 to 2023.
As shown in the plot below, the mean observed sales price graudually increased from 2000-2018 while the model’s mean predicted sales prices, highlighted in (INSERT COLOR), are expected to increase at a faster rate from 2019-2013.
#for all HMA
dat.monthly.prediction %>%
group_by(months_since_00, Legend) %>%
summarize(Mean_Predicted_Price = mean(Price,na.rm=T)) %>%
ggplot(aes(as.numeric(months_since_00), Mean_Predicted_Price, colour=Legend)) +
geom_point() +
geom_smooth(aes(colour=Legend),method="loess") +
scale_color_manual(values=c("#83CE7B","#266E75"))+
labs(title="Jefferson County Average Home Prices per month: 2000 - 2023", subtitle="95% Confidence Intervals included") +
plotTheme()+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
When comparing the mean observed and predicted sales prices across urban core and non-urban core HMAs, as shown below, the trends are simliar in these different markets, but the rate and variability in the predictions from month to month are different. For example, the mean predicted home prices in the non-urban core are expected to increase at a faster rate than urban core and there is less variability in prices month-to-month in the non-urban core. This tells us that prices vary less in the non-urban core.
#for inner core
grid.arrange(
dat.monthly.prediction %>%
filter(H_Mkt_Area == inner_core) %>%
group_by(months_since_00, Legend) %>%
summarize(Mean_Predicted_Price = mean(Price,na.rm=T)) %>%
ggplot(aes(as.numeric(months_since_00), Mean_Predicted_Price, colour=Legend)) +
geom_point() +
geom_smooth(aes(colour=Legend),method="loess") +
scale_color_manual(values=c("#83CE7B","#266E75"))+
labs(title="Urban Core", subtitle="95% Confidence Intervals included",
x = "Months Since 2000",
y = "Mean Sales Price") +
theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")+
ylim(0, 400000)+
plotTheme(),
dat.monthly.prediction %>%
filter(H_Mkt_Area != inner_core) %>%
group_by(months_since_00, Legend) %>%
summarize(Mean_Predicted_Price = mean(Price,na.rm=T)) %>%
ggplot(aes(as.numeric(months_since_00), Mean_Predicted_Price, colour=Legend)) +
geom_point() +
geom_smooth(aes(colour=Legend),method="loess") +
scale_color_manual(values=c("#83CE7B","#266E75"))+
labs(title="Non-Urban Core ", subtitle="95% Confidence Intervals included",
x = "Months Since 2000",
y = "") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
ylim(0, 400000) +
plotTheme(),
ncol=2,
top = "Jefferson County Average Home Prices per month (2000 - 2023)")
Plot of mean observed sale prices (2000 - 2018) and mean predicted sale prices (2019 - 2023) for each month, for each housing market areas.
dat.monthly.prediction %>%
group_by(H_Mkt_Area, months_since_00, Legend) %>%
summarize(Mean_Predicted_Price = mean(Price,na.rm=T)) %>%
ggplot(aes(as.numeric(months_since_00), Mean_Predicted_Price, colour=Legend)) +
geom_point(size=.1) +
geom_smooth(aes(colour=Legend),method="loess") +
scale_color_manual(values=c("#83CE7B","#266E75"))+
facet_wrap(~H_Mkt_Area, scales="free", ncol=3) +
labs(title="Jefferson County Average Home Prices per Month (2000 - 2023) for Each Housing Market Area", subtitle="95% Confidence Intervals included",
x = "Months Since 2000",
y = "Mean Sales Price") +
plotTheme()+
theme(strip.text = element_text(colour = 'white'))+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
dat.monthly.prediction %>%
group_by(H_Mkt_Area, Legend) %>%
summarize(Standard_Error = sd(Price,na.rm=T) / sqrt(n())) %>%
left_join(hma, by=c("H_Mkt_Area" = "H_Mkt_A")) %>%
st_sf() %>%
ggplot() + geom_sf(aes(fill=Standard_Error)) +
facet_wrap(~Legend, ncol=1) +
scale_fill_gradientn(colours = palette5, na.value = "grey50",guide = "colourbar",aesthetics = "fill")+
mapTheme()
Finally, we calculated the mean percent change in home prices from 2018-2023 for each housing market area to determine which areas are expected to have the greatest opportunity - the neighborhoods with the lowest home prices today, but the greatest incease in homes prices in the future. The darker blue areas, highlighted in the map below, with the higher percent change in home sale prices are more likely to have a greater chance of opportunity for lower-income household compared to areas with lower or negative percentage changes.
price.change.hma%>%
left_join(hma, by=c("H_Mkt_Area" = "H_Mkt_A")) %>%
st_sf() %>%
ggplot() +
geom_sf(aes(fill=`Price_Change`)) +
scale_fill_gradientn(colours = palette5, na.value = "grey50",guide = "colourbar",aesthetics = "fill", name = "Percent Price\nChange")+
mapTheme()
So far, the LAHTF has invested X dollars in Y units in Louisville over the last 6 years (2014-2019). When comparing home price predictions to affordable housing, the goal is invest in areas that will have the highest chance of opportunity. In order to determine whether or these units are being developed in neighborhoods with the greatest amount of opportunity, we decided to calculate the percent change in home sale price (2018-2023) of properties surrounding (within 1/2 mile) the LAHTF projects invested.
As shown on the map below, a majority of affordable housing units financed by the LAHTF are in the North Core, Downtown, and Central Preston. However, the North Core is expected to have the least amount of opportunity compared to all other HMAs that LAHTF has invested in. This is because the average home price surrounding most of the LAHTF projects in the North Core are expected to declline over the next five years while the prices are expected to increase around other projects in Downtown and Central Preston - areas with the greatest opportunity for investment. This prediction tool can not only help the LAHTF evelaute past investments, but determine what areas they should invest in moving forward that have the greatest opportunity.
grid.arrange(
ggplot() +
geom_sf(data = hma, color = "white", fill = "gray") +
geom_sf(data=price.change.AH.geo, aes(color = q5(Price_Change)), show.legend = 'point', size = 2) +
scale_colour_manual(values = palette5,
labels = qBr(price.change.AH.geo, "Price_Change"),
name = "Quintile\nBreaks") +
labs(title = "Percent Change in Home Sale Price (2018-23)",
subtitle = "Based on Housing Trust Fund projects",
caption = "") +
mapTheme(),
price.change.AH.geo %>%
st_set_geometry(NULL) %>%
group_by(H_Mkt_A) %>%
summarise(count = n(),
Total_Units = sum(TtlUnts),
Total_Dollars = sum(Amount),
P_Units = (Total_Units/sum(price.change.AH.geo$TtlUnts)*100)) %>%
right_join(hma, by = c("H_Mkt_A" = "H_Mkt_A"))%>%
st_as_sf()%>%
ggplot()+
geom_sf(aes(fill = P_Units))+
scale_fill_gradientn(colours = palette5, na.value = "grey50",guide = "colourbar",aesthetics = "fill", name = 'Percent\n of Units') +
labs(title = "Percentage of Affordable Housing Units per HMA",
subtitle = '') +
mapTheme(),
ncol=2)
[INCLUDE 1 EXAMPLE ON HOW TO USE THE APPLICATION]